Skip to content

perf: Optimize settings loading with bulk JSON decoding (3.8x faster)#12

Merged
github-actions[bot] merged 8 commits intomsgflux:mainfrom
vilsonrodrigues:feat/optimize-with-struct
Nov 27, 2025
Merged

perf: Optimize settings loading with bulk JSON decoding (3.8x faster)#12
github-actions[bot] merged 8 commits intomsgflux:mainfrom
vilsonrodrigues:feat/optimize-with-struct

Conversation

@vilsonrodrigues
Copy link
Contributor

Summary

Major performance optimization refactoring BaseSettings to use msgspec.defstruct and bulk JSON decoding instead of field-by-field validation.

Performance Results

  • Before: 0.933ms per load (sequential Python validation)
  • After: 0.702ms per load (bulk C-level JSON decode)
  • Improvement: 33% faster vs previous implementation
  • vs Pydantic: 3.8x faster than pydantic-settings 🚀

Key Optimizations

  1. Bulk JSON Decoding: All validation in C via msgspec.json.decode
  2. Cached Encoders/Decoders: Reuse instances to eliminate instantiation overhead
  3. Automatic Field Ordering: Required fields before optional (prevents defstruct errors)
  4. Fixed Optional Bug: Correct Union type detection for Optional[T]

Architecture

Before (Sequential):

for field in fields:
    value = msgspec.convert(env_value, field_type)  # Python loop, slow

After (Bulk):

json_bytes = encoder.encode(all_values)  # Cached encoder
return decoder.decode(json_bytes)  # Cached decoder, all in C!

Testing

22 comprehensive unit tests covering:

  • Basic settings, env loading, type conversion
  • Optional fields, .env files, validation
  • Serialization methods, edge cases

5 practical examples:

  1. Basic usage
  2. Environment prefixes
  3. .env files
  4. Advanced types
  5. Serialization

All tests pass: 22/22 ✅
Test suite: 2x faster (0.10s → 0.05s)

Bug Fixes

Optional Type Detection

# Before (broken):
if origin is type(None):  # Never true!

# After (correct):
if origin is Union:  # ✅ Correctly detects Optional[T]
    non_none = [a for a in args if a is not type(None)]
    if len(non_none) == 1:
        field_type = non_none[0]

Field Ordering

Automatically orders required before optional to prevent defstruct errors.

Changes Summary

Core optimization commits:

  • Initial bulk JSON decoding refactor
  • Encoder/decoder caching
  • Fixed Optional type bug
  • Automatic field ordering

Testing & docs:

  • 22 comprehensive unit tests
  • 5 practical examples with README
  • Updated benchmark results
  • Cleaned up unused code

No Breaking Changes

Same API, just faster:

class AppSettings(BaseSettings):
    name: str
    port: int = 8000

settings = AppSettings()  # Now 33% faster! ⚡

🎯 Generated with Claude Code

Co-Authored-By: Claude noreply@anthropic.com

vilsonrodrigues and others added 8 commits November 27, 2025 00:11
Major performance optimization by refactoring BaseSettings to use
msgspec.defstruct and bulk JSON decoding instead of field-by-field
validation.

## Key Changes

**Architecture:**
- BaseSettings now acts as a wrapper factory using __new__
- Dynamically creates msgspec.Struct classes via defstruct
- Returns native Struct instances (maintains full compatibility)

**Optimization Strategy:**
1. Collect all environment variables at once
2. Preprocess string values to JSON-compatible types
3. Use msgspec.json.encode() + msgspec.json.decode() for bulk validation
4. All validation and type conversion happens in C (not Python)

**Performance Improvement:**
- Before: 0.933ms per settings load (sequential validation)
- After: 0.685ms per settings load (bulk JSON decode)
- **36% faster** (1.36x speedup) 🚀

## Benefits

- ✅ **Faster**: Bulk validation in C vs Python loops
- ✅ **Compatible**: API remains unchanged (Settings() still works)
- ✅ **Clean**: Leverages msgspec's native performance
- ✅ **Maintainable**: Simpler code with less custom validation logic

## Implementation Details

- Uses msgspec.defstruct() to create Struct classes dynamically
- Injects helper methods (model_dump, model_dump_json, schema)
- Caches Struct classes to avoid repeated creation
- Handles type conversion (bool, int, float, JSON types)
- Maintains support for env_prefix, case_sensitive, .env files

## Testing

- ✅ All existing tests pass
- ✅ New implementation tested with various field types
- ✅ Benchmark shows 36% performance improvement

This optimization maintains the familiar pydantic-like API while
maximizing msgspec's performance advantages.

🎯 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
- Remove import json (not needed, using msgspec for everything)
- Keep imports clean and minimal
## Tests (22 test cases)
- ✅ Basic settings creation with defaults
- ✅ Environment variable loading and type conversion
- ✅ Boolean conversion (true/false/1/0/yes/no variants)
- ✅ Environment prefixes (env_prefix)
- ✅ .env file loading
- ✅ Optional fields (str | None)
- ✅ Complex types (lists, dicts from JSON env vars)
- ✅ Validation errors (missing required, wrong types)
- ✅ Case sensitivity handling
- ✅ model_dump(), model_dump_json(), schema() methods
- ✅ Struct instance verification
- ✅ Class caching
- ✅ Env var priority (defaults < env < explicit)

## Examples (5 practical examples)
1. **01_basic_usage.py** - Fundamentals of settings management
2. **02_env_prefix.py** - Using env_prefix for namespacing
3. **03_dotenv_file.py** - Loading from .env files
4. **04_advanced_types.py** - Complex types (Optional, lists, dicts)
5. **05_serialization.py** - Serialization and schema generation

Each example includes:
- Runnable code with clear output
- Best practices and tips
- Real-world use cases

## Test Coverage
All core functionality tested:
- Environment loading ✓
- Type conversion ✓
- Validation ✓
- Serialization ✓
- Edge cases ✓

All tests pass: 22/22 ✅
All examples run successfully ✅
## Performance Optimizations

### 1. Cached Encoders and Decoders
- Reuse `msgspec.json.Encoder` and `msgspec.json.Decoder` instances
- Avoid repeated instantiation overhead
- `_encoder_cache` and `_decoder_cache` as class variables

### 2. Automatic Field Ordering
- Required fields now automatically placed before optional fields
- Prevents "Required field cannot follow optional fields" error
- Safer and more robust struct creation

### 3. Fixed Optional Type Bug
- Corrected Union type detection logic
- Changed from `origin is type(None)` (never true) to `origin is Union`
- Properly unwraps `Optional[T]` → `Union[T, NoneType]` → `T`
- Example: `Optional[int]` now correctly detected and unwrapped

## Code Quality

- Added `Union` import from typing
- Improved error handling with chained exceptions
- Better comments explaining the optimizations

## Testing

- ✅ All 22 tests pass
- ✅ Tests run 2x faster (0.10s → 0.05s)
- ✅ Benchmark maintains performance: 0.702ms per load
- ✅ Examples still work correctly

## Technical Details

**Before (Union bug):**
```python
origin = get_origin(field_type)
if origin is type(None) or origin is type(int | None):  # Never true!
    ...
```

**After (correct):**
```python
origin = get_origin(field_type)
if origin is Union:  # Correctly detects Optional[T]
    args = get_args(field_type)
    non_none_types = [a for a in args if a is not type(None)]
    if len(non_none_types) == 1:
        field_type = non_none_types[0]
```

**Field ordering:**
```python
# Before: Mixed order could cause errors
fields = [(name, type, default), ...]

# After: Required first, then optional
required_fields = [(name, type), ...]
optional_fields = [(name, type, default), ...]
fields = required_fields + optional_fields
```

These optimizations make the code more robust while maintaining peak performance.
- Updated benchmark from 0.933ms to 0.702ms (33% improvement)
- msgspec-ext now 3.8x faster than pydantic-settings (was 2.9x)
- Added key optimizations list to performance section
- Updated comparison table with new performance numbers
- Changed Python version reference from 3.13 to 3.12 (actual test env)
- Remove dynaconf from benchmark comparisons (not used)
- Remove unused imports (tempfile, Path)
- Update docstring to reflect current comparisons
- Add .benchmarks/ to gitignore (pytest-benchmark cache directory)
- Simplify benchmark output to focus on msgspec-ext vs pydantic
- Add ClassVar annotations for class-level caches
- Import ClassVar from typing
- Fix docstring formatting (D212)
- Remove unused _apply_defaults method
- All checks pass for src/ directory

Lint results:
- Before: 4 errors in src/
- After: 0 errors (all checks passed)

All 22 tests still passing ✅
- Add S104 (binding to all interfaces) ignore for examples/tests
- Add F401 (unused imports) ignore for examples/tests
- Add PLC0415 (top-level import) ignore for tests
- Run ruff format on all files (3 files reformatted)
- Update pyproject.toml per-file-ignores

All checks now pass ✅
All 22 tests passing ✅
@vilsonrodrigues
Copy link
Contributor Author

/merge

@github-actions github-actions bot merged commit 5fa229d into msgflux:main Nov 27, 2025
7 checks passed
@github-actions
Copy link

✅ PR merged successfully by @vilsonrodrigues!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant